General instructions for all assignments:
R Markdown file (named as: [AndrewID]-HW09.Rmd – e.g. “sventura-HW09.Rmd”) to the Homework 09 submission section on Blackboard. You do not need to upload the .html file.(4 points)
Organization, Themes, and HTML Output
warning = FALSE and message = FALSE in every code block.ggplot theme and use of color:ggplot() color scheme.color = "black").library(tidyverse)
library(data.table)
library(forcats)
# Simple theme with white background, legend at the bottom
my_theme <- theme_bw() +
theme(axis.text = element_text(size = 12, color = "indianred4"),
text = element_text(size = 14, face = "bold", color = "darkslategrey"))
# Colorblind-friendly color palette
my_colors <- c("#000000", "#56B4E9", "#E69F00", "#F0E442", "#009E73", "#0072B2",
"#D55E00", "#CC7947")
(3 points each)
Read
Read this article. Write 1-3 sentences about what you learned from it.
Read this article. Write 1-3 sentences about what you learned from it.
Read the in-depth description of the ggmap package in the short paper by David Kahle and Hadley Wickham here. Write 1-3 sentences about what you learned from it.
Read the article on ggmap here. Which functions can you use to create geographic heat maps?
(2 points each)
Maps with ggmap
Install and load the ggmap package. This package can be used to access maps from Google’s Maps API.
Look at the help documentation for the get_map() function. What does it do? What are the different map sources that can be used in get_map()?
In the help documentation, describe the zoom parameter. Roughly, what would be an appropriate value of this parameter if we wanted to display a square with width 1 mile? (Just a rough estimate is fine; an exact number is not required.)
In the help documentation, what are the different maptype values that can be used? Which of these is unique to Google Maps?
What does the map in the following code block show? Describe it. Explain what each of the parameters in the get_map() and ggmap() functions are doing.
(Note: Before doing this, you may need to install the most updated versions of these packages from GitHub – see commented code below.)
#devtools::install_github('hadley/ggplot2')
#devtools::install_github('thomasp85/ggforce')
#devtools::install_github('thomasp85/ggraph')
#devtools::install_github('slowkow/ggrepel')
library(ggmap)
map_base <- get_map(location = c(lon = -79.944248, lat = 40.4415861),
color = "color",
source = "google",
maptype = "hybrid",
zoom = 16)
map_object <- ggmap(map_base,
extent = "device",
ylab = "Latitude",
xlab = "Longitude")
map_object
Recreate the map in part (d). Try changing the zoom parameter to a non-integer value (e.g. 16.5). What happens?
Type class(map_object). What kind of object is your map?
(2 points each)
Finding Latitudes and Longitudes
There are many ways to find latitude and longitude coordinates of specific places. Here’s one easy way:
Go to Google Maps. Type in “times square, nyc” and hit enter. The map should center around New York City. Now, look at the URL in your internet browser. After the @ symbol, the latitude and longitude of the center of the map are displayed (in order). What is the latitude of the map centered on Times Square? What is the longitude?
After the latitude and longitude, the zoom level is displayed (e.g. “17z”). Change this to zoom level 12, and delete any text to the right. This should give you a map that displays most of New York City. Do the latitude/longitude coordinates change when you do this?
Using the code from Problem 3d as a template, create a black and white (color = "bw") map of NYC in R, centered near Times Square, at a zoom level of 12, and with a roadmap map type. Describe the map that is output in R.
For both problems below, it may help to look over:
Both files are located under Course Content.
(30 points total)
Mapping US Flights
airports <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat",
col_names = c("ID", "name", "city", "country", "IATA_FAA",
"ICAO", "lat", "lon", "altitude", "timezone", "DST"))
routes <- read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat",
col_names = c("airline", "airlineID", "sourceAirport",
"sourceAirportID", "destinationAirport",
"destinationAirportID", "codeshare", "stops",
"equipment"))
departures <- routes %>%
dplyr::group_by(sourceAirportID) %>%
dplyr::summarize(flights = n()) %>%
mutate(sourceAirportID = as.integer(as.vector(sourceAirportID)))
arrivals <- routes %>%
dplyr::group_by(destinationAirportID) %>%
dplyr::summarize(flights = n()) %>%
mutate(destinationAirportID = as.integer(as.vector(destinationAirportID)))
# Merge each of the arrivals/departures data.frames with the airports data.frame above
airportD <- left_join(airports, departures, by = c("ID" = "sourceAirportID"))
airportA <- left_join(airports, arrivals, by = c("ID" = "destinationAirportID"))
map <- get_map(location = 'United States', zoom = 4)
mapPoints <- ggmap(map) +
geom_point(aes(x = lon, y = lat, size = flights),
data = airportA) +
ggtitle("Location of Airports Sized by Number of Arriving Flights")
# Add a custom legend to the plot
mapPointsLegend <- mapPoints +
scale_size_area(breaks = c(10, 50, 100, 500, 900),
labels =c(10, 50, 100, 500, 900),
name = "Number of Arriving Routes")
mapPointsLegend
my_airport_code <- "LAX"
lax_routes <- dplyr::filter(routes,
sourceAirport == my_airport_code |
destinationAirport == my_airport_code)
lax_airport <- lax_routes %>%
left_join(airports, by = c("sourceAirport" = "IATA_FAA")) %>%
dplyr::select(destinationAirport, lat, lon, timezone) %>%
dplyr::rename(source_lat = lat, source_lon = lon, source_timezone = timezone) %>%
left_join(airports, by = c("destinationAirport" = "IATA_FAA")) %>%
dplyr::select(source_lat, source_lon, source_timezone, lat, lon, timezone) %>%
dplyr::rename(dest_lat = lat, dest_lon = lon, dest_timezone = timezone)
mapPointsLegend +
geom_segment(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = lax_airport, alpha=.15) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles") +
theme_void()
mapPointsLegend +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat), data = lax_airport,
arrow = arrow(length = unit(0.02, "npc")), alpha=.15) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles") +
coord_cartesian()
lax_airport$change_timezone <- lax_airport$source_timezone - lax_airport$dest_timezone
lax_airport <- lax_airport[which(abs(lax_airport$change_timezone) <= 3), ]
# Filter by routes that wil be shown on map
mapPointsLegend +
geom_curve(aes(x = source_lon, y = source_lat, xend = dest_lon,
yend = dest_lat, color = change_timezone), data = lax_airport,
arrow = arrow(length = unit(0.02, "npc"))) +
labs(x = "Longitude", y = "Latitude",
title = "Flights To and From Los Angeles",
color = "Change in \nTime Zone \nin Hours") +
coord_cartesian() +
scale_color_gradient(low = "blue", high = "red")
(40 points)
Choropleth Maps of Rent Prices
rent <- read_csv("https://raw.githubusercontent.com/sventura/315-code-and-datasets/master/data/price.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## City = col_character(),
## Metro = col_character(),
## County = col_character(),
## State = col_character()
## )
## See spec(...) for full column specifications.
rent_jan2017 <- rent %>%
select(County, State, `January 2017`) %>%
rename(jan_2017 = `January 2017`) %>%
arrange(State) %>%
group_by(State) %>%
summarize(mean_rent = mean(jan_2017))
state_data <- data_frame(state.abb, state.name) %>%
mutate(state.name = tolower(state.name)) %>%
left_join(rent_jan2017, by = c("state.abb" = "State"))
state_borders <- map_data("state") %>%
left_join(state_data, by = c("region" = "state.name"))
ggplot(state_borders, aes(x = long, y = lat, fill = mean_rent)) +
geom_polygon(aes(group = state.abb), color = "black") +
theme_void() +
coord_map("mercator") +
scale_fill_gradient2(high = "darkred", low = "darkblue",
mid = "white", midpoint = 1500) +
labs(title = "Mean Rent per State, Jan. 2017",
fill = "Mean Rent ($)")
The highest average rent cost states seem to be California, Massachusetts, and New Jersey, while the lowest rent cost states include Oklahoma, West Virginia, and Arkansas. Average rent prices seem to be higher around the Pacific coast and in New England, while they tend to be lower in the Deep South, the Midwest, and the Mountain time zone.
rent_jan2017 <- rent %>%
select(County, State, `January 2017`) %>%
rename(jan_2017 = `January 2017`) %>%
arrange(County) %>%
group_by(County) %>%
summarize(mean_rent = mean(jan_2017)) %>%
mutate(County = tolower(County))
county_borders <- map_data("county") %>%
left_join(rent_jan2017, by = c("subregion" = "County"))
ggplot(county_borders, aes(x = long, y = lat, fill = mean_rent)) +
geom_polygon(aes(group = group)) +
theme(plot.title = element_text(hjust = 0.5)) +
coord_map("mercator") +
scale_fill_gradient2(high = "darkred", low = "darkblue",
mid = "white", midpoint = 2000) +
labs(title = "Mean Rent per County, Jan. 2017",
fill = "Mean Rent ($)") +
theme_bw()
THe higher end of the county average monthy rents seem to predominantly be on the coasts. The above average monthly rents are concentrated in a few counties in Californa, New York, New Jersey, and Florida. There also seem to be a few high-rent counties states in the Midwest. These are significantly higher than most of the country; the rest of the counties seems to have average monthly rent below 2000 dollars per month, while these counties seem to be 3000 dollars per month and above. There seems to be a lot of missing data from the Great Plains states.
See the BonusProblems assignment on Blackboard.